21 research outputs found
Sparse gaussian processes for large-scale machine learning
Gaussian Processes (GPs) are non-parametric, Bayesian models able to achieve state-of-the-art performance in supervised learning tasks such as non-linear regression and classification, thus being used as building blocks for more sophisticated machine learning applications. GPs also enjoy a number of other desirable properties: They are virtually overfitting-free, have sound and convenient model selection procedures, and provide so-called “error bars”, i.e., estimations of their predictions’ uncertainty. Unfortunately, full GPs cannot be directly applied to real-world, large-scale data sets due to their high computational cost. For n data samples, training a GP requires O(n3) computation time, which renders modern desktop computers unable to handle databases with more than a few thousand instances. Several sparse approximations that scale linearly with the number of data samples have been recently proposed, with the Sparse Pseudo-inputs GP (SPGP) representing the current state of the art. Sparse GP approximations can be used to deal with large databases, but, of course, do not usually achieve the performance of full GPs. In this thesis we present several novel sparse GP models that compare favorably with SPGP, both in terms of predictive performance and error bar quality. Our models converge to the full GP under some conditions, but our goal is not so much to faithfully approximate full GPs as it is to develop useful models that provide high-quality probabilistic predictions. By doing so, even full GPs are occasionally outperformed. We provide two broad classes of models: Marginalized Networks (MNs) and Inter- Domain GPs (IDGPs). MNs can be seen as models that lie in between classical Neural Networks (NNs) and full GPs, trying to combine the advantages of both. Though trained differently, when used for prediction they retain the structure of classical NNs, so they can be interpreted as a novel way to train a classical NN, while adding the benefit of input-dependent error bars and overfitting resistance. IDGPs generalize SPGP by allowing the “pseudo-inputs” to lie in a different domain, thus adding extra flexibility and performance. Furthermore, they provide a convenient probabilistic framework in which previous sparse methods can be more easily understood. All the proposed algorithms are tested and compared with the current state of the art on several standard, large-scale data sets with different properties Their strengths and weaknesses are also discussed and compared, so that it is easier to select the best suited candidate for each potential application.Los procesos Gaussianos (Gaussian Processes, GPs) son modelos Bayesianos noparamétricos que representan el actual estado del arte en tareas de aprendizaje supervisado tales como regresión y clasificación. Por este motivo, son uno de los bloques básicos usados en la construcción de otros algoritmos de aprendizaje máquina más sofisticados. Asimismo, los GPs tienen una variedad de propiedades muy deseables: Son prácticamente inmunes al sobreajuste, disponen de mecanismos sensatos y cómodos para la selección de modelo y proporcionan las llamadas "barras de error", es decir, son capaces de estimar la incertidumbre de sus propias predicciones. Desafortunadamente, los GPs completos no pueden aplicarse directamente a bases de datos de gran tamaño, cada vez más fecuentes en la actualidad. Para n muestras, el tiempo de cómputo necesario para entrenar un GP escala como O(n3), lo que hace que un ordenador doméstico actual sea incapaz de manejar conjuntos de datos con más de unos pocos miles de muestras. Para solventar este problema se han propuesto recientemente varias aproximaciones "dispersas", que escalan linealmente con el número de muestras. De entre éstas, el método conocido como "procesos Gaussianos dispersos usando pseudo-entradas"(Sparse Pseudo-inputs GP, SPGP), representa el actual estado del arte. Aunque este tipo de aproximaciones dispersas permiten tratar bases de datos mucho mayores, obviamente no alcanzan el rendimiento de los GPs completos. En esta tesis se introducen varios modelos de GP disperso que presentan un rendimiento mayor que el del SPGP, tanto en cuanto a capacidad predictiva como a calidad de las barras de error. Los modelos propuestos convergen al GP completo que aproximan bajo determinadas condiciones, pero el objetivo de esta tesis no es tanto aproximar fielmente el GP completo original como proporcionar modelos prácticos de alta capacidad predictiva. Tanto es así que, en ocasiones, los nuevos modelos llegan a batir al GP completo que los inspira. Se proporcionan dos clases generales de modelos: Redes marginalizadas (Marginalized Networks, MNs) y GPs inter-dominio (Inter-Domain GPs, IDGPs). Las MNs pueden verse como modelos que se encuentran a mitad de camino entre las redes neuronales clásicas (Neural Networks, NNs) y los GPs completos, intentando combinar las ventajas de ambos. Aunque la fase de entrenamiento de una MN es diferente, cuando se utiliza para predicción mantiene la estructura de una NN clásica, de manera que las MNs pueden ser interpretadas como una manera novedosa de entrenar NNs clásicas, al tiempo que se añaden beneficios adicionales, como resistencia al sobreajuste y "barras de error"dependientes de la entrada. Los IDGPs generalizan el SPGP, permitiendo a las "pseudo-entradas"residir en un dominio diferente del de entrada, incrementado así la flexibilidad y el rendimiento. Además, proporcionan un marco probabilístico adecuado para entender modelos dispersos anteriores Así pues, todos los algoritmos propuestos son puestos a prueba y comparados con el SPGP sobre varios conjuntos de datos estándar de diferentes propiedades y de gran tamaño. Se intentan identificar además las fortalezas y debilidades de cada uno de los métodos, de manera que sea más sencillo elegir el mejor candidato para cada aplicación potencial
Query Training: Learning a Worse Model to Infer Better Marginals in Undirected Graphical Models with Hidden Variables
Probabilistic graphical models (PGMs) provide a compact representation of
knowledge that can be queried in a flexible way: after learning the parameters
of a graphical model once, new probabilistic queries can be answered at test
time without retraining. However, when using undirected PGMS with hidden
variables, two sources of error typically compound in all but the simplest
models (a) learning error (both computing the partition function and
integrating out the hidden variables is intractable); and (b) prediction error
(exact inference is also intractable). Here we introduce query training (QT), a
mechanism to learn a PGM that is optimized for the approximate inference
algorithm that will be paired with it. The resulting PGM is a worse model of
the data (as measured by the likelihood), but it is tuned to produce better
marginals for a given inference algorithm. Unlike prior works, our approach
preserves the querying flexibility of the original PGM: at test time, we can
estimate the marginal of any variable given any partial evidence. We
demonstrate experimentally that QT can be used to learn a challenging
8-connected grid Markov random field with hidden variables and that it
consistently outperforms the state-of-the-art AdVIL when tested on three
undirected models across multiple datasets
Support vector machines with constraints for sparsity in the primal parameters
This paper introduces 1 a new support vector machine (SVM) formulation to obtain sparse solutions in the primal SVM parameters, providing a new method for feature selection based on SVMs. This new approach includes additional constraints to the classical ones that drop the weights associated to those features that are likely to be irrelevant. A !-SVM formulation has been used, where ! indicates the fraction of features to be considered. This paper presents two versions of the proposed sparse classifier, a 2-norm SVM and a 1-norm SVM, the latter having a reduced computational burden with respect to the first one. Additionally, an explanation is provided about how the presented approach can be readily extended to multiclass classification or to problems where groups of features, rather than isolated features, need to be selected. The algorithms have been tested in a variety of synthetic and real data sets and they have been compared against other state of the art SVM-based linear feature selection methods, such as 1-norm SVMand doubly regularized SVM. The results show the good feature selection ability of the approaches.This work was supported in part by the Ministry of Science and
Innovation (Spanish Goverment), under Grant TEC2008-02473Publicad
3D Neural Embedding Likelihood for Robust Probabilistic Inverse Graphics
The ability to perceive and understand 3D scenes is crucial for many
applications in computer vision and robotics. Inverse graphics is an appealing
approach to 3D scene understanding that aims to infer the 3D scene structure
from 2D images. In this paper, we introduce probabilistic modeling to the
inverse graphics framework to quantify uncertainty and achieve robustness in 6D
pose estimation tasks. Specifically, we propose 3D Neural Embedding Likelihood
(3DNEL) as a unified probabilistic model over RGB-D images, and develop
efficient inference procedures on 3D scene descriptions. 3DNEL effectively
combines learned neural embeddings from RGB with depth information to improve
robustness in sim-to-real 6D object pose estimation from RGB-D images.
Performance on the YCB-Video dataset is on par with state-of-the-art yet is
much more robust in challenging regimes. In contrast to discriminative
approaches, 3DNEL's probabilistic generative formulation jointly models
multi-object scenes, quantifies uncertainty in a principled way, and handles
object pose tracking under heavy occlusion. Finally, 3DNEL provides a
principled framework for incorporating prior knowledge about the scene and
objects, which allows natural extension to additional tasks like camera pose
tracking from video
Graph schemas as abstractions for transfer learning, inference, and planning
We propose schemas as a model for abstractions that can be used for rapid
transfer learning, inference, and planning. Common structured representations
of concepts and behaviors -- schemas -- have been proposed as a powerful way to
encode abstractions. Latent graph learning is emerging as a new computational
model of the hippocampus to explain map learning and transitive inference. We
build on this work to show that learned latent graphs in these models have a
slot structure -- schemas -- that allow for quick knowledge transfer across
environments. In a new environment, an agent can rapidly learn new bindings
between the sensory stream to multiple latent schemas and select the best
fitting one to guide behavior. To evaluate these graph schemas, we use two
previously published challenging tasks: the memory & planning game and one-shot
StreetLearn, that are designed to test rapid task solving in novel
environments. Graph schemas can be learned in far fewer episodes than previous
baselines, and can model and plan in a few steps in novel variations of these
tasks. We further demonstrate learning, matching, and reusing graph schemas in
navigation tasks in more challenging environments with aliased observations and
size variations, and show how different schemas can be composed to model larger
2D and 3D environments.Comment: 12 pages, 5 figures in main paper, 12 pages and 8 figures in appendi